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The performance of a lossy data compression scheme for uniformly biased Boolean messages is 
investigated via methods of statistical mechanics. Inspired by a formal similarity to the storage 
capacity problem in neural network research, we utilize a perceptron of which the transfer function 
is appropriately designed in order to compress and decode the messages. Employing the replica 
method, we analytically show that our scheme can achieve the optimal performance known in the 
framework of lossy compression in most cases when the code length becomes infinite. The validity 
of the obtained results is numerically confirmed. 
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I. INTRODUCTION 

Recent active research on error-correcting codes (ECC) 
has revealed a great similarity between information the- 
ory (IT) and statistical mechanics (SM) @, § |, @, |, |, §. 
As some of these studies have shown that methods from 
SM can be useful in IT, it is natural to expect that a sim- 
ilar approach may also bring about novel developments 
in fields other than ECC. 

The purpose of the present paper is to offer such an 
example. More specifically, we herein employ methods 
from SM to analyze and develop a scheme of data com- 
pression. Data compression is generally classified into 
two categories; lossless and lossy compression ||. The 
purpose of lossless compression is to reduce the size of 
messages in information representation under the con- 
straint of perfect retrieval. The message length in the 
framework of lossy compression can be further reduced 
by allowing a certain amount of distortion when the orig- 
inal expression is retrieved. 

The possibility of lossless compression was first pointed 
out by Shannon in 1948 in the source coding theorem 
whereas the counterpart of lossy compression, termed the 
rate- distortion theorem, was presented in another paper 
by Shannon more than ten years later |l(| . Both of these 
theorems provide the best possible compression perfor- 
mance in each framework. However, their proofs are 
not constructive and suggest few clues for how to de- 
sign practical codes. After much effort had been made 
for achieving the optimal performance in practical time 
scales, a practical lossless compression code that asymp- 
totically saturates the source-coding limit was discovered 



||H| . Nevertheless, thus far, regarding lossy compression, 
no algorithm which can be performed in a practical time 
scale saturating the optimal performance predicted by 
the rate-distortion theory has been found, even for sim- 
ple information sources. Therefore, the quest for better 
lossy compression codes remains one of the important 
problems in IT §, ||, [if, |L§ . 

Therefore, we focus on designing an efficient lossy com- 
pression code for a simple information source of uniformly 
biased Boolean sequences. Constructing a scheme of data 
compression requires implementation of a map from com- 
pressed data of which the redundancy should be min- 
imized, to the original message which is somewhat bi- 
ased and, therefore, seems redundant. However, since the 
summation over the Boolean field generally reduces the 
statistical bias of the data, constructing such a map for 
the aforementioned purpose by only linear operations is 
difficult, although the best performance can be achieved 
by such linear maps in the case of ECC jj], [|, ^, 
and lossless compression |jL5| . In contrast, producing a 
biased output from an unbiased input is relatively easy 
when a non-linear map is used. Therefore, we will employ 
a perceptron of which the transfer function is optimally 
designed in order to devise a lossy compression scheme. 

The present paper is organized as follows. In the next 
section, we briefly introduce the framework of lossy data 
compression, providing the optimal compression perfor- 
mance which is often expressed as the rate- distortion 
function in the case of t he uniformly biased Boolean 
sequences. In section III, we explain how to employ 
a non-monotonic perceptron to compress and decode a 
given message. The ability and limitations of the pro- 
posed scheme are examined using the replica method in 
Due to a specific (mirror) symmetry that 
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section 

we impose on the transfer function of the perceptron, 
one can analytically show that the proposed method can 
saturate the rate-distortion function for most choices of 
parameters when the code length becomes infinite. The 
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obtained results are numerically validated by means of 
the extrapolation on data from systems of finite size in 
section [v| The final section is devoted to summary and 
discussion. 



II. LOSSY DATA COMPRESSION 

Let us first provide the framework of lossy data com- 
pression. In a general scenario, a redundant original 
message of M random variables y = (y 1 , y 2 , . . . , y M ), 
which we assume here as a Boolean sequence y^ € {0, 1}, 
is compressed into a shorter (Boolean) expression s = 
(si, s 2 , . . . , sjv) (si € {0,1}, AT < M). In the decoding 
phase, the compressed expression s is mapped to a rep- 
resentative message y — (y 1 , y 2 , . . . , y M ) (y^ € {0, 1}) in 
order to retrieve the original expression (Fig. |l]). 





Encoder 


s 


Decoder 











FIG. 1: Encoder and decoder in the framework of lossy com- 
pression. The retrieved sequence y need not be identical to 
the original sequence y. 



achievable compression rate can be reduced below the 
entropy per bit. This limit R(D) is termed the rate- 
distortion function, which provides the optimal compres- 
sion performance in the framework of lossy compression. 

The rate-distortion function is formally obtained as a 
solution of a minimization problem with respect to the 
mutual information between y and y J^] . Unfortunately, 
solving the problem is generally difficult and analytical 
expressions of R{D) are not known in most cases. 

The uniformly biased Boolean message in which each 
component is generated independently from an identical 
distribution P(y^ = 1) = 1 — P(y^ = 0) = p is one of 
the exceptional models for which R(D) can be analyti- 
cally obtained. For this simple source, the rate-distortion 
function becomes 



R(D) = H 2 (p) - H 2 (D), 



(4) 



where H2(x) = — .xlog 2 x — (1 — x ) log 2 (l — x). 

However, it should be addressed here that a practical 
code that saturates this limit has not yet been reported, 
even for this simplest model. Therefore, in the following, 
we focus on this information source and look for a code 
that saturates Eq. (^) examining properties required for 
good compression performance. 



In the source coding theorem, it is shown that per- 
fect retrieval y — y is possible if the compression rate 
R = N/Al is greater than the entropy per bit of the 
message y when the message lengths M and N become 
infinite. On the other hand, in the framework of lossy 
data compression, the achievable compression rate can 
be further reduced allowing a certain amount of distor- 
tion between the original and representative messages y 
and y. 

A measure to evaluate the distortion is termed the dis- 
tortion function, which is denoted as d(y,y) > 0. Here, 
we employ the Hamming distance 
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d(y,y) = Y,d(i/,r), 



where 



if f = y^ 

1 if / ^ y>> ' 



(1) 



(2) 



as is frequently used for Boolean messages. 

Since the original message y is assumed to be generated 
randomly, it is natural to evaluate the average of Eq. (0). 
This can be performed by averaging d(y, y) with respect 
to the joint probability of y and y as 



d(y,y) 



y y 



(3) 



By allowing the average distortion per bit d(y,y)/M 
up to a given permissible error level < D < 1, the 



III. COMPRESSION BY PERCEPTRON 

In a good compression code for the uniformly biased 
source, it is conjectured that compressed expressions s 
should have the following properties: 

(I) In order to minimize loss of information in the orig- 
inal expressions, the entropy per bit in s must be 
maximized. This implies that the components of s 
are preferably unbiased and uncorrelated. 

(II) In order to reduce the distortion, the representative 
message y{s) should be placed close to the typical 
sequences of the original messages which are biased. 

Unfortunately, it is difficult to construct a code that 
satisfies both of the above two requirements utilizing only 
linear transformations over the Boolean field while such 
maps provide the optimal performance in the case of ECC 
0, |, ||, 0] and lossless compression [Jl5[ . This is be- 
cause a linear transformation generally reduces statistical 
bias in messages, which implies that the second require- 
ment (II) cannot be realized for unbiased and uncorre- 
lated compressed expressions s that are preferred in the 
first requirement (I). 

One possible method to design a code that has the 
above properties is to introduce a non-linear transforma- 
tion. A perceptron provides one of the simplest schemes 
for carrying out this task. 

In order to simplify notations, let us replace all the 
Boolean expressions {0, 1} with binary ones {1, —1}. By 
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this, we can construct a non- linear map from the com- 
pressed message s to the retrieved sequence y utilizing a 
perceptron as 

y» = f(J=8* a A (/i = l ) 2,...,M) ) (5) 

where x^ 1 ' 2 '"'' are fixed TV-dimensional vectors to 
specify the map and /(•) is a transfer function from a real 
number to a binary variable y 11 £ {1,-1} that should be 
optimally designed. 

Since each component of the original message y is 
produced independently, it is preferred to minimize the 
correlations among components of a representative vec- 
tor y, which intuitively indicates that random selection 
of x^ may provide a good performance. Therefore, we 
hereafter assume that vectors 3^=1,2, ...,Af are indepen- 
dently drawn from the ./V-dimensional normal distribu- 
tion P{x) = {2tt)~ N/2 exp [-\x\ 2 /2\. 

Based on the non-linear map dq), a lossy compression 
scheme can be defined as follows: 

• Compression: For a given message y, find a 
vector s that minimizes the distortion d(y,y(s)), 
where y(s) the representative vector which is gen- 
erated from s by Eq. (||). The obtained s is the 
compressed message. 

• Decoding: Given the compressed message s, the 
representative vector y(s) produced by Eq. (|5|) pro- 
vides the approximate message for the original mes- 
sage. 

Here, we should notice that the formulation of the cur- 
rent problem has become somewhat similar to that for 
the storage capacity evaluation of the Ising perceptron 
]Tg| , [TtJ regarding s, x^ and j/ M as "Ising couplings" , "ran- 
dom input pattern" and "random output" , respectively. 
Actually, the rate-distortion limit in the current frame- 
work for D = and p = 1/2 can be calculated as the 
inverse of the storage capacity of the Ising perceptron, 
a- 1 

This observation implies that the simplest choice of 
the transfer function f(u) = sign(u), where sign(u) = 1 
for u > and —1 otherwise, does not saturate the 
rate-distortion function (||). This is because the well- 
known storage capacity of the simple Ising perceptron, 
a c = M/N w 0.83, means that the "compression limit" 
achievable by this monotonic transfer function becomes 
R c = N/M = a^ 1 m 1.20 and far from the value pro- 
vided by Eq. (||) for this parameter choice R(D — 0) = 
H 2 (p = 1/2) - H 2 (D = 0) = 1. We also examined the 
performances obtained by the monotonic transfer func- 
tion for biased messages < p < 1/2 by introducing an 
adaptive threshold in our previous study jl8| and found 
that the discrepancy from the rate-distortion function be- 
comes large in particular for relatively high R while fairly 
good performance is observed for low rate regions. 

Therefore, we have to design a non-trivial function /(•) 
in order to achieve the rate-distortion limit, which may 



seem hopeless as there are infinitely many degrees of free- 
dom to be tuned. However, a useful clue exists in the 
literature of perceptrons, which have been investigated 
extensively during the last decade. 

In the study of neural network, it is widely known that 
employing a non-monotonic transfer function can highly 
increase the storage capacity of perceptrons |llj . In par- 
ticular, Bex et al. reported that the capacity of the Ising 
perceptron that has a transfer function of the reversed- 
wedge type f(u) = /rw(") = sign(u - fc)sign(u)sign^u + 
k) can be maximized to a c — 1 by setting k = v2 In 2 
p0| , which implies that the rate-distortion limit R = 1 is 
achieved for the case of p = 1/2 and D = in the current 
context. Although not explicitly pointed out in their pa- 
per, the most significant feature observed for this param- 
eter choice is that the Edwards- Anderson (EA) order pa- 
rameter (1/N) \ (s)\ 2 vanishes to zero, where (• • ■ ) denotes 
the average over the posterior distribution given y and 
ajM=i. 2,...,m_ xhis implies that the dynamical variable s 
in the posterior distribution given y and 3^=1,2, ...,Af j g 
unbiased and, therefore, the entropy is maximized, which 
meets the first requirement (I) addressed above. Thus, 
designing a transfer function f(u) so as to make the EA 
order parameter vanish seems promising as the first dis- 
cipline for constructing a good compression code. 

However, the reversed-wedge type transfer function 
/rw(u) is not fully satisfactory for the present purpose. 
This is because this function cannot produce a biased se- 
quence due to the symmetry /rw(~ w ) = ~/rw( u )j which 
means that the second requirement (II) provided above 
would not be satisfied for p ^ 0.5. 

Hence, another candidate for which the EA parameter 
vanishes and the bias of the output can be easily con- 
trolled must be found. A function that provides these 
properties was once introduced for reducing noise in sig- 
nal processing, such as /la(w) = sign (k — |u|) |2l], |2|] 
(Fig. ||). Since this locally activated function has mirror 
symmetry /la(— u) — /la(w), both s and — s provide 
identical output for any input, which means that the EA 
parameter is likely to be zero. Moreover, one can eas- 
ily control the bias of output sequences by adjusting the 
value of the threshold parameter k. Therefore, this trans- 
fer function looks highly promising as a useful building- 
block for constructing a good compression code. 

In the following two sections, we examine the validity 
of the above speculation, analytically and numerically 
evaluating the performance obtained by the locally acti- 
vated transfer function /la(w). 

IV. ANALYTICAL EVALUATION 

We here analytically evaluate the typical performance 
of the proposed compression scheme using the replica 
method. Our goal is to calculate the minimum permis- 
sible average distortion D when the compression rate 
R = N/M is fixed. The analysis is similar to that of 
the storage capacity for perceptrons. 
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FIG. 2: Input-output relation of /la(«). 



Employing the Ising spin expression, the Hamming dis- 
tortion can be represented as 



d(y,y(s)) = J2{ 1 -®k(u»;yn}- 



(6) 



where 

8 fc («;l) = l-0 fc (u;-l) = 
1 



1, for \u\ < k 
0, otherwise ' 



N 



(7) 
(8) 



Then, for a given original message y and vectors 
x n(=i,2,...,M) ^ -j-^e number of dynamical variables s which 
provide a fixed Hamming distortion d(y, y(s)) = MD 
(0 < D < 1), can be expressed as 



Af(D)=Tr5{MD-d(y,y(s)))- 



(9) 



Since y and a; M are randomly generated predetermined 
variables, the quenched average of the entropy per bit 
over these parameters 



S(D) 



(]nAf(D))y,x 



N 



(10) 



to which the raw entropy per bit (1 /N) In Af(D) be- 
comes identical for most realizations of y and x 11 , is 
naturally introduced for investigating the typical prop- 
erties. This can be performed by the replica method 
(l/N)(\nN-(D)) yx = \im n ^o(l/nN)\n{N- n (D)) yx , 
analytically continuing the expressions of (M n (D))y X 
obtained for natural numbers n to non-negative real num- 
ber n H |§. 

When n is a natural number, Af n (D) can be expanded 
to a summation over n-replicated systems as Af n (D) = 
Trsi, S 2 v .. >s „ nLi 5{MD - d(y,y(s a ))), where the sub- 
script a denotes a replica index. Inserting an identity 



/+oo 
dq ab 8 (s a ■ S b - Nq ab ) 



1 

exp 



n{n— 1)/2 r +oo 



a>b 

J2 tab (s° ■ S b - Nq a 



+200 



ab II U<lab 

"'° a>b "'- 400 a>b 



J| dq a 



a>b 



(11) 



into this „ M 

of the delta funct 



expression and utilizing the Fourier expression 
Ita function 



S(MD - d(y,y(s a ))) 

^explf3 a (MD-d(y,y(s a )))}, (12) 



we can calculate the moment {N n (D))y X for natural 
numbers n = 1, 2, 3, ... as 



(Af n (D)) 



y,x 



exp N 



i? 1 In ■ 



\\ df3 a Y\_ d 1ab / II dqab 
a J a>b J a>b 

J dv duexp (^-^v t Qv + iv ■ iij Y[ {e _/3a + (1 - e~ 0a )@ k (u a ; y)} 
+ In J T> exp lj2q ab s a s b \ 1 - ^ q ab q ab + R^D Pa , 



(13) 



where Q is an n X n matrix of which elements are given by the parameters {q a b} and (•••) 
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E„=±i (PS(V - 1) + (1 7 p)S(y + l)) (•■■)• 

In the thermodynamic limit N, M — > oo keeping the 

compression rate R finite, this integral can be evaluated 
via a saddle point problem with respect to macroscopic 
variables q ab , q a b and f3 a . 

In order to proceed further, a certain ansatz about 
the symmetry of the replica indices must be assumed. 
We here assume the simplest one, that is, the replica 



(Va > b), (14) 



symmetric (RS) ansatz 



Pa = 0, q a b = 9, q a b = q 



for which the saddle point expression of Eq. (|13|) is likely 
to hold for any real number n. Taking the limit n — > 
of this expression, we obtain 



S(D) = lim AT " v " 

K ' n^o Nn 



extr <| R 



-l 



t > I Dt \n{e- f3 + {l-e- f3 ){H(w 1 )-H{w 2 )}} 
+ (1 - p) I Dt In {e- p + (1 - e' 13 ) {-H K) + H (w 2 ) + 1}} 



q(i-q) 



J Du {ln(2 cosh y/^u)} + R^fiD^ , 



(15) 



r 



where w\ 
and H(x) 



-J^SL W2 = Dx = ^§= exp(-^ 2 /2) 

extr{- • • } denotes the extremiza- 



VT=9 ' 

tion. Under this RS ansatz, the macroscopic variable q 
indicates the EA order parameter as q — (1/N)\(s)\ 2 . 
The validity of this solution will be examined later. 

Since the dynamical variable s is discrete in the current 
system, the entropy ( |l5| ) must be non-negative. This 
indicates that the achievable limit for a fixed compression 
rate R and a transfer function /la(w) which is specified 
by the threshold parameter k can be characterized by a 
transition depicted in Fig. ||. 

Utilizing the Legendre transformation f3F(f3) = 
min£){i? -1 (3D — S(D)}, the free energy F(f3) for a fixed 
inverse temperature /3, which is an external parameter 
and should be generally distinguished from the varia- 
tional variable j3 in Eq. (|l5|), can be derived from S(D). 
This implies that the distortion D(f3) that minimizes 
R~ 1 f3D — S(D) and of which the value is computed from 
F(/3) as D(/3) = d((3F{P))/d{R- l (3) can be achieved 
by randomly drawing s from the canonical distribution 
P(s\y, x 11 ) ~ exp[— (3d(y, y(s))] which is provided by the 
given f3. For a modest (3, the achieved distortion D{(3) 
is determined as a point for which the slope of S(D) be- 
comes identical to iT 1 ^ and S(D) > (Fig. | (a)). As 
f3 becomes higher, D({3) moves to the left, which indi- 
cates that the distortion can be reduced by introducing 
a lower temperature. However, at a critical value f3 c char- 
acterized by the condition S(D((3 C )) = (Fig. | (b)), the 
number of states that achieve D(j3 c ) which is the typical 
value of ming {d(y, y(s))} vanishes to zero. Therefore, 
for (3 > (3 C , D((3) is fixed to D(f3c) and the distortion 
D < D(/3 C ) is not achievable (Fig. | (c)). 

The above argument indicates that the limit of the 
achievable distortion D(j3 c ) for a given rate R and a 



threshold parameter k in the current scheme can be eval- 
uated from conditions 



D(f3) 
S(D(P)) 



9(i?-!/3) : 
0, 



(16) 
(17) 



being parameterized by the inverse temperature (3. 

Due to the mirror symmetry /la( — u ) = /la(u), q = 
q = becomes the saddle point solution for the extrem- 
ization problem ( |l5| ) as we speculated in the previous 
section, and no other solution is discovered. Inserting 
q = q = into the right-hand side of Eq. (15) and em- 
ploying the Legendre transformation, the free energy is 
obtained as 

(3F{{3) = - In 2 -iT 1 [pln{e- /3 + (l-e- / V fc } 
+ (1 - p) In {e-P + (1 - e-P) (1 - A k )}] , (18) 



where Ak = 1 — 2H(k), which means that Eqs. ( |16| ) and 
© yield 



-P 



D 



-P A 



e-l 3 + (1 - e~P)A k 

e-P -e-P(l-A k ) 



+ {l-e-P)(\-A k y 



and 



R 



[piog 2 { e -^ + (i- e -V4 



(19) 



+ (l- p )log 2 {e-^ + (l-e-^(l-A k )}} 



J3_ 
In 2 



-P - e-?A k 



e-1 3 + (1 - e-P)A k 



+ (1-P) — 



e~P - e~P(l - A k ) 



(20) 
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respectively. 

The rate-distortion function R(D) represents the opti- 
mal performance that can be achieved by appropriately 
tuning the scheme of compression. This means that R{D) 
can be evaluated as the convex hull of a region in the D- 
R plane defined by Eqs. ( jl^ ) and (P0|) by varying the 
inverse temperature (3 and the threshold parameter k (or 
Ak). Minimizing R for a fixed D, one can show that the 
relations 



D 



(l-e-'V* = 



1-D' 
P 

1-D' 



(21) 
(22) 



are satisfied at the convex hull, which offers the optimal 
choice of parameters /3 and k as functions of a given per- 
missible distortion D and a bias p. Plugging these into 
Eq. (pfj|), we obtain 



R = R KS (D) = -plog 2 p-(l-p)\og 2 (l-p) 
+Dlog 2 D + (1 - D) log 2 (l - D) 
= H 2 (p)-H 2 (D), (23) 



which is identical to the rate-distortion function for uni- 
formly biased binary sources (^]). 

The results obtained thus far indicate that the pro- 
posed scheme achieves the rate-distortion limit when the 
threshold parameter k is optimally adjusted. However, 
since the calculation is based on the RS ansatz, we must 
confirm the validity of assuming this specific solution. We 
therefore examined two possible scenarios for the break- 
down of the RS solution. 

The first scenario is that the local stability against the 
fluctuations for disturbing the replica symmetry is bro- 
ken, which is often termed the Almeida-Thouless (AT) 
instability [ p5[ , and can be examined by evaluating the 
excitation of the free energy around the RS solution. 
As the current RS solution can be simply expressed as 
q = q = 0, the condition for this solution to be stable can 
be analytically obtained as 



R > Rat(D) = 



1 f2fc(l-2D) 



p{l-p) 



2tt 



(24) 



In most cases, the RS solution satisfies the above condi- 
tion and, therefore, does not exhibit the AT instability. 
However, we found numerically that for relatively high 
values of distortion 0.336 <D < 0.50, R RS (D) can be- 
come slightly smaller than Rat(D) for a very narrow 
parameter region, 0.499 p < 0.5, which indicates the 
necessity of introducing the replica symmetry breaking 
(RSB) solutions. This is also supported analytically by 
the fact that the inequality Rat{D) ~ 2.94 x \p~D) 2 > 
Rrs{D) ~ 2.89 X (p - Df holds for p = 0.5 in the 
vicinity of D ^ p. Nevertheless, this instability may not 
be serious in practice, because the area of the region 
Rrs{D) < R < Rat{D), where the RS solution becomes 
unstable, is extremely small, as indicated by Fig. |5| (a). 




achievable distortion 
(a) Achievable case 




^critical distortion 
(b) Critical case 




non-achievable distortion 
(c) Non-achievable case 



FIG. 3: Schematic profile of the entropy (per bit) S(D). 
(a): For a modest /3, the achieved distortion D(/3) is such 
a point that 8S{D)/dD = R' 1 (3 holds. This is real- 
ized by the random sampling from the canonical distribu- 
tion P(s\y , x^) ~ exp[— f3d(y, y(s))]. (b): At a critical in- 
verse temperature j3 — j3 c , the entropy for D((3 C ) which is 
the minimum distortion vanishes to zero, (c): It is impossi- 
ble to achieve any distortion which is smaller than D(f3 c ) as 
S(D) = for D < D{(3 C ). 
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The other scenario is the coexistence of an RSB solu- 
tion that is thermodynamically dominant while the RS 
solution is locally stable. In order to examine this possi- 
bility, we solved the saddle point problem assuming the 
one-step RSB (1RSB) ansatz in several cases for which 
the RS solution is locally stable. However, no 1RSB so- 
lution was discovered for R > R-rs{D). Therefore, we 
concluded that this scenario need not be taken into ac- 
count in the current system. 

These insubstantial roles of RSB may seem somewhat 
surprising since significant RSB effects above the stor- 
age capacity have been reported in the research of per- 
ceptrons with continuous couplings plj . However, 
this may be explained by the fact that, in most cases, 
RSB solutions for Ising couplings can be expressed by the 
RS solutions adjusting temperature appropriately, even 
if non-monotonic transfer functions are used Jlj], . 



V. NUMERICAL VALIDATION 

Although the analysis in the previous section theoret- 
ically indicates that the proposed scheme is likely to ex- 
hibit a good compression performance, it is still impor- 
tant to confirm it by experiments. Therefore, we have 
performed numerical simulations implementing the pro- 
posed scheme in systems of finite size. 

In these experiments, an exhaustive search was per- 
formed in order to minimize the distortion d(y,y(s)) so 
as to compress a given message y into s, which implies 
that implementing the current scheme in a large system 
is difficult. Therefore, validation was performed by ex- 
trapolating the numerically obtained data, changing the 
system size from N = 4 to N = 20. 

Figure |] shows the average distortions obtained from 
5000 ~ 10000 experiments for (a) unbiased (p = 0.5) 
and (b) biased (p = 0.2) messages, varying the system 
size N and the compression rate R(— 0.05 ~ 1.0). For 
each R, the threshold parameter k is tuned to the value 
determined using Eqs. (|2l]), ( |2^ ) and the rate-distortion 
function R = R(D) in order to optimize the performance. 

These data indicate that the finite size effect is rel- 
atively large in the present system, which is similar to 
the case of the storage capacity problem |p6| , and do 
not necessarily seem consistent with the theoretical pre- 
diction obtained in the previous section. However, the 
extrapolated values obtained from the quadratic fitting 
with respect to 1/N are highly consistent with curves 
of the rate-distortion function (Fig. [5] (a) and (b)), in- 
cluding one point in the region where the AT stability is 
broken (inset of Fig. [5|(a)), which strongly supports the 
validity and efficacy of our calculation based on the RS 
ansatz. 




(b) p = 0.2 

FIG. 4: The averages of the achieved distortions are plotted 
as functions of 1/N for (a) p = 0.5 (unbiased) and (b) p — 0.2 
(biased) messages changing the compression rate R. The plots 
are obtained from 5000 ~ 10000 experiments for N = 4 ~ 20, 
minimizing the distortion d(y,y(s)) by means of exhaustive 
search. Each set of plots corresponds to R — 0.05 (p = 0.5 
only), 0.1, 0.2, . . ., 1.0, from the top. 



VI. SUMMARY AND DISCUSSION 

We have investigated a lossy data compression scheme 
of uniformly biased Boolean messages employing a per- 
ceptron of which the transfer function is non-monotonic. 
Designing the transfer function based on the properties 
required for good compression codes, we have constructed 
a scheme that saturates the rate-distortion function that 
represents the optimal performance in the framework of 
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FIG. 5: The limits of the achievable distortion expected for 
N — > oo are plotted versus the code rate R for (a) p = 0.5 
(unbiased) and (b) p = 0.2 (biased) messages. The plots 
are obtained by extrapolating the numerically obtained data 
for systems of N = 4 ~ 20 shown in Fig. 0. The full and 
dashed curves represent the rate-distortion functions and the 
AT lines, respectively. Although the AT stability is broken 
for D ^ 0.336 for p = 0.5 (inset of (a)), the numerical data is 
highly consistent with the RS solution which corresponds to 
the rate-distortion function. 



lossy compression in most cases. 



It is known that a non-monotonic single layer percep- 
tron can be regarded as equivalent to certain types of 
multi-layered networks, as in the case of parity and com- 
mittee machines. Although tuning the input-output re- 
lation in multi-layered networks would be more compli- 
cated, employing such devices might be useful in practice 
because several heuristic algorithms that could be used 
for encoding in the present context have been proposed 
and investigated |27], [2§| . 

In real world problems, the redundancy of information 
sources is not necessarily represented as a uniform bias; 
but rather is often given as non-trivial correlations among 
components of a message. Although it is just unfortunate 
that the direct employment of the current method may 
not show a good performance in such cases, the locally ac- 
tivated transfer function /la(w) that we have introduced 
herein could serve as a useful building-block to be used in 
conjunction with a set of connection vectors x ^ =1 -' 2 '---' M 
that are appropriately correlated for approximately ex- 
pressing the given information source, because by using 
this function, we can easily control the input-output re- 
lation suppressing the bias of the compressed message to 
zero, no matter how the redundancy is represented. 



Finally, although we have confirmed that our method 
exhibits a good performance when executed optimally in 
a large system, the computational cost for compressing 
a message may render the proposed method impractical. 
One promising approach for resolving this difficulty is to 
employ efficient approximation algorithms such as vari- 
ous methods of the Monte Carlo sampling J2?| and of the 
mean field approximation pC| . Another possibility is to 
reduce the finite size effect by further tuning the profile 
of the transfer function. Investigation of these subjects 
is currently under way. 
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